Performance Evaluation of Real-Time SchedulingAlgorithms for Multicore Electronic Control Units in Automotive ApplicationsINTRODUCTIONInternal Combustion Engines (ICE) used inautomobiles is responsible for the pollution because of their emission of GreenHouse Gases (GHG) into the environment.
The number of automobiles has beenincreasing with increase in population which led to the increase in pollution. Tocontrol these, many auto emission laws were introduced by government. In spiteof all these effort, use of ICE’s continued to be the cause of environmentalpollution. This fact combined with the advancements in the technology has becomethe reason for evolution of Electric Vehicles (EV) and Hybrid Electric Vehicles(HEV).
These categories of vehicles have reduced environmental pollution andthe dependency on fossil fuels for automobiles. EV uses electric power from thebattery as the major power source. However, introduction of EV involves manychallenges due to the limitations of performance and difference in technologiesbeing used. These challenges have become the motivation for the research ofimproving the technologies used and performance in EVs. Electric Control Units (ECU) are the embeddedsystems used in automobiles to control one or more electronic subsystems of thevehicle. The major components of an ECU includes a microcontroller which acts acore, memory for storing the control software, sensors for getting inputs, actuatorsfor outputs, and communication links for connecting to other ECUs. An ECU getsthe input (can be analog or digital) from the different sensors and computesthe output (driver or logic outputs) using the stored software. There aredifferent ECUs for the different systems in the automotive; few examplesinclude Body Control Module (BCM), Engine Control Module (ECM), Power TrainControl Module (PCM), and Vehicle Control Module (VCM).
Thus, the requirementof the modern automobiles to be safe, efficient and, comfortable is fulfilledby use of ECUs. But the customer needs more sophistication in terms of comfort,safety and infotainment, which increases the complexity of the software. Thisleads to the need of highly computational microcontrollers and also increasesthe number of ECUs being used. An Average ECU consumes 200mA which is around2.4W power; hence with increase in processing power and number of ECUs, thepower consumption is increased. Thus, the main objective in development ofelectric vehicles is to save energy whenever possible.NECSESSITYOF MULTICORE ECUsIn order to overcome the above issues, multipleexisting control systems are integrated into a single ECU with multiple cores.Since the software which were running on multiple ECUs are integrated to run onmultiple cores on single ECU, provides higher performance.
Also the number ofcommunication interfaces required to connect ECUs of different control systemsget reduced and so the traffic on the communication network. That is,introduction of multicore reduces the complexity of In-vehicle architecture. Applicationssuch as in engine controllers or infotainment may require higher performance.Without adding any additional hardware feature, these requirements can be metby parallelizing jobs on multiple cores. To improve the safety, the safetycritical applications can be run in redundant manner on multiple cores, and theoutput given by the majority of the cores is chosen to be the final output. Inaddition to these advantages, the number of ECUs gets reduced which in turnreduces the power consumption due to ECU.
IMPACTOF SOFTWARE ON ECUs Most of the systems in anautomobile are electronic control systems with microcontrollers to forefficient working. Thus, in an automobile, 30-40 percent of the total value iscontributed by the embedded system and software. The complexity of the softwaredepends on the criticality of the functionality being carried out. Thus, eitherincrease number of ECUs or increase in performance requirement of the ECUincreases the complexity of the software. This in turn increases the power consumed byECU for executing the programs.
This also requires the development ofcentralized hardware and hierarchical ECU architecture that runs software whichis hardware independent. However, use of multicore ECUs with parallelexecution can reduce the software complexity thereby reducing the powerconsumption. This in-turn requires theconcept of operating systems and middleware IMPACTOF SCHEDULERS ON ECUs Scheduler is the part ofoperating system that decides which tasks should be executed next in aprocessor. The decisions are made based on some policies which is termed asscheduling algorithm. Thus, the applications running on multiple cores share resources,which require an efficient scheduler to schedule the tasks.
This ensures thelogical correctness of the results of the tasks, prevention of shared resourcesfrom corrupting, and helps in effective utilization of the cores withoutleaving any core to be idle. Without an optimal schedule, the average responsetime of the tasks or the logical correctness of the results may vary. This maylead to performance degradation or safety hazards. SCHEDULINGALGORITHMSIn a multi-tasking environment, when there are morethan task is ready to get executed, the operating system (OS) must decide whena task should be allotted to the CPU and also the duration for which it can beexecuted in CPU.
This decision is made by the part of OS named as scheduler using scheduling algorithms, the set of rules and policies used toallocate the tasks to the CPU. The basic idea is to create a task executionorder keeping the CPU busy, and to allocate the resources shared by the taskseffectively. In a real-time system, every time an event occurs, a task getsgenerated to handle the event. Usually, every real-time system consists ofnumber of real-time tasks. These tasks must execute and produce the resultsbefore a particular time, which is referred as deadline. Every task has different time bounds, and the consequenceof a task missing its deadline also differs from task to task. Thus, theobjective of real-time scheduling algorithms is to schedule tasks satisfyingthe timing constraints.REAL-TIMESCHEDULERSThe scheduler need not run continuously, it isinvoked by the operating system only during the scheduling points to make the scheduling decisions.
In uniprocessorenvironment, there is a single shared processor in which all the tasks areexecuted. In a multiprocessor environment, the tasks can be executed in severalavailable processors. Based on the number of processor available, thescheduling algorithms are divided into uniprocessor and multiprocessorscheduling algorithms.
UNIPROCESSORSCHEDULING ALGORITHMS Based on thescheduling points, the scheduling algorithms are classified as clock-driven,event-driven and hybrid. In clock-driven schedulers, the scheduling points are the interruptsfrom the clock source, e.g. Table-driven and cyclic scheduling. Thesealgorithms do not assign priorities to the tasks; instead it makes use of thetable which the order and time for execution of tasks based on the interruptfrom clock. These type of schedulers are easy to implement and efficient, henceused in embedded applications.
These kinds of schedulers are referred as offline schedulers, since the schedulingdecisions are made before the execution of the application.In event-drivenschedulers, the scheduling points are the occurrence of events like arrival ofa new task, completion of a task, interrupts or a task being preempted. Ascheduler can be preemptive or non-preemptive. A preemptive schedulerallows a task to get suspended while executing and start the execution of othertask based on the priority. Non-preemptive scheduler does not allow thesuspension of tasks once it starts executing.
The scheduler can use for static priority or dynamic priority while taking scheduling decisions. In staticpriority, the priorities if the task is set before execution and it remainsfixed for all the instances of the task. Rate Monotonic (RM) algorithm, thepriority of the tasks is based on its period. Deadline monotonic (DM)algorithm, the tasks are prioritized based on the deadline. In case of dynamicpriority, the priorities of the tasks are computed during execution and varyfor various instances of the same task. Earliest Deadline First (EDF) algorithm,the task with shorter deadline is given higher priority. Least Laxity First (LLF)algorithm, the task with minimum laxity is given higher priority.
In hybrid schedulers, the scheduling pointscan be both clock interrupts and events. Round robin scheduling algorithm, theready tasks are held in a circular queue and taken up in sequence. The tasksrun for fixed time intervals called time slices. If the task is not completedwith the allocated time slot, it is inserted again in the ready queue. Event-drivenand hybrid schedulers are referred as onlineschedulers, since the scheduling decisions are made during the execution of theapplication based on the current state of the system and tasks.MULTIPROCESSORSCHEDULING ALGORITHMS Multiprocessor schedulingalgorithms are divided into three categories as global scheduling, partitioned scheduling and hybrid scheduling. Among these, uniprocessor algorithms can beextended to be used in a multiprocessor environment under global andpartitioned algorithms.
Global Scheduling Algorithm In global scheduling approach, allthe tasks that are ready for execution are stored in a global queue which isshared by all the processors as shown in Fig. 1. If there are ‘n’ processors,the global scheduler will select ‘n’ high priority tasks from the queue forexecution. Fig. 1 Global Scheduling Thetasks can be preempted from one processor and its execution can be resumed insome other processor, this is known as taskmigration. Since there is no constraint on restricting the execution oftask in a particular processor, global scheduling approach provides good workloadbalance among the processors which may prevent tasks from missing deadline. Itlowers the average response time for the tasks, and simple to implement. Butthe task migration leads to overheads.
Global-EDF, global-RM, and global-DMalgorithms, the ‘n’ active high priority tasks will be selected for execution.Other examples are Proportionate fairness (PFair), Deadline Partitioningfairness (DP-Fair) algorithms which provides fluid schedules for the tasks.Partitioned Scheduling Algorithm In partitioned schedulingalgorithm, task set is divided among the processors and a local schedulingalgorithm is used for each processor to schedule the tasks assigned to thatprocessor. This approach reduces the multiprocessor scheduling problem into ‘n’uniprocessor problems as shown in Fig.2. That is, the tasks are staticallyassigned to the processors and the tasks are scheduled using uniprocessoralgorithms like RM, EDF, LLF, etc. Different uniprocessor algorithm can be usedfor each processor core after the tasks are partitioned; this providesisolation to different cores in the systems. But task migration is restrictedand each task is executed only in the assigned processor.
Hence, there are nooverheads due the migration of tasks between the processors which is advantageover the global scheduling approach. Hence this category of schedulingalgorithms is suggested by AUTOSAR, standardized software architecture for automotive. Task partitioning among theprocessors is NP-complete problem. Hence, the performance of partitionedalgorithms is limited by the performance of the bin-packing heuristic being followed. Bin-packing algorithms try topack the set of objects in the available set of bins.
Here, the bins arerepresented by the processors and the objects by tasks. The utilization of theprocessor is used to check if the processor is full or can take tasks further. So,they try to allocate tasks to each processor such that the utilization of theprocessor is 1. If task is allocated to a processor, it is allocated as awhole; that is, it cannot be split among different processors. In a system with ‘n’ processors, the bin-packingalgorithm cannot guarantee successful partitioning of the tasks if the totalutilization of the tasks is greater than (n+1)/2. Thus, in worst-case, only 50%capacity of the system will be utilized. The unused capacity cannot beexploited, which is on the disadvantages of this category of schedulingalgorithms.
Fig. 2 Partitioned Scheduling Some of the bin-packingapproaches that are widely used are FirstFit (FF), Next Fit (NF), Best Fit (BF), and Worst Fit (WF). A task is said to fit in a processor, if the processorutilization remains less than 1 after that task has been allocated to it. InFF, the tasks are assigned to the first available processor in which the taskcan be fit.
In NF, the tasks are tried to be allocated the processor to whichthe last task was assigned. If it cannot be fit in that processor, it will betried to fit in next processor. In WF, the task is fit into the processor whichhas the least utilization. In BF, it is the reverse of WF, the tasks are triedto fit in the processor which has maximum utilization.
If it cannot fit in thatprocessor, then the task is tried to fit in the processor which has next highestmaximum utilization and so on. Other approaches would be, Decreased FF,Decreased NF, Decreased BF, and Decreased WF. In each of these decreased versionsof bin-packing algorithms, the tasks are arranged in decreasing order of theirutilization and one of the methods is applied for partitioning the tasks.
Rate MonotonicFirst Fit (RMFF), in which the FF is the algorithm for task partitioning and RMis the local scheduler algorithm.Semi-Partitioned SchedulingAlgorithms Like partitioned schedulingalgorithms, semi-partitioned algorithms also requires an off-line taskpartitioning algorithm and a run-time scheduler to dispatch the tasks. As inpartitioned algorithm, the tasks are executed in on processor; but when thetask cannot be fit into any individual processor, it is split and allowed tomigrate between the processors as shown in Fig.3. The tasks migrate among theprocessor in such a way, neither they do not migrate back to the same processorin the same period nor they get executed in more than one processor at the sametime. The main idea of this type of algorithms is toglobally schedule the tasks that cannot be assigned to only one processorbecause of the limitations of the bin-packing heuristics, and to improve theutilization bound of the partitioned scheduling. Since task migration islimited, the overhead is less and increases the system utilization, which were thedisadvantages of global and partitioning algorithms. Fig.
3Semi-Partitioned SchedulingLLREF The Largest Local Remaining Execution First (LLREF) is a DP-Fairalgorithm, which is optimal for scheduling periodic tasks. It depends on the abstractioncalled Time and Local Execution TimeDomain plane (T-L Plane). The T-L planes are repeated over time, and ascheduling algorithm that can schedule tasks in single T-L plane can scheduletasks in all repeated planes. Fig.
4 shows the Kth T-L Plane. The status of each task isrepresented in the form of token inT-L plane. The location of the token describes the value of current time alongthe horizontal axis and value of the remaining execution time of the task alongthe vertical axis. The local remainingexecution time, rik(t) of the task Ti doesnot mean the deadline, it represents the time of the task that must be executedwithin this T-L plane. In every T-L plane, the task Ti must executefor its local execution time liwhich is equal to its utilization Ui multiplied by the lengthof the plane. When the time slice begins, in a system with ‘n’ processors, ‘n’tasks with largest local execution time will be selected for execution. Thetokens are capable of moving in two directions; when a task is selected for execution,it moves downwards diagonally as TN in Fig. 4, otherwise it movesvertically along x-axis as T1 in Fig.
4. Fig. 4 Kth T-L Plane The scheduling objective in Kth T-L Plane is that all the tokensrepresenting the tasks moves towards the rightmost vertex of the T-L plane;i.e., tfk. If all the tokens in the plane successfullyreach the vertex, it is said the tasks are locallyfeasible. If tasks are locally feasible, it will be scheduled in theconsecutive T-L planes. Local Laxityof task Ti is given as tfk – tcur – li, wherecurrent time is represented by tcur.
The oblique side of the T-Lplane is represented as no local laxitydiagonal (NLLD). Whenever a token reaches the NLLD, it means the laxity ofthe task has reached zero and it must be selected immediately for execution.Otherwise the tasks cannot achieve the local feasibility. There are two timeinstants when the scheduling decision has to be made in the T-L plane.
Bottom hitting event (B event), when thelocal remaining execution time of the tasks reaches zero as TN inFig. 4, so the task with next largest remaining execution time at this timeinstant can be selected for further execution. Ceiling hitting event (C event), when the token hits the NLLD as T1in Fig. 4, the task must be selected immediately for execution. The token ofthe tasks with zero local remaining execution time are said to be inactive, others are said to be active. Fig. 5 T-L plane for tasks T1, T2,and T3 Consider an example, with 3tasks T1, T2, and T3 to be executed on 2processors as shown in Fig.
5. Initially T1, T2 areselected for execution in processors ?1 and ?2, it can beseen that the corresponding tokens move diagonally downwards. Also the tokenrepresenting T3 moves vertically. After some time, C event occursbecause of T3, hence T2 is preempted and T3is executed on ?2.
Now, the token representing task T2starts moving vertically and token of task T3 starts movingdownwards, while the token of task T1 keeps moving downwards. Thenwhen B event occurs as T1 gets completed, again T2 isselected for execution.EKG EKG stands for EDF with tasksplitting and k processors in a Group. EDF with task splitting and k processorsin a Group (EKG) is also an optimal algorithm for periodic tasks. The tasks areassigned to the processors in such a way utilization of the processor does notexceed 100%.
The designer should select a parameter ‘k’, the value of whichshould be 1 < k < n; n is the number of processors in thesystem. The algorithm separates the tasks into light and heavy tasksbased on the value of a separator. Thevalue of the separator, SEP is given as (k/k+1) if k < n or 1 if k = n. Thetask is classified as heavy if Ci/Ti > SEP, else it is classified as light task. The heavy tasksare assigned to the dedicated processors, thus whenever a heavy task arrives,it is executed on the assigned processor. The light tasks are assigned to agroup of processors, with at most k processors in the group. The tasks areconsidered one by one, current processor is denoted by index p and the currenttask is denoted by index i. The main aim of the algorithm is to assign the taskthat is currently considered to current processor p, if the schedulability conditionfor EDF is true.
Else if the condition is false, the task is split into twoportions and assigned to the current processor p and processor p + 1. Then theprocessor with next highest index is considered for the rest of the tasks. Thisis similar to the next-fit bin packing algorithm, but the algorithm permitssplitting of tasks if the task cannot be assigned to a processor. Whenever atask arrives in that group of processors, all processors in that group executesthe dispatcher to select the tasks for execution.
Let T0 denotethe time when a task arrives and T1 denote the time when anyother task in that group arrives; then two time instants are calculated, Taand Tb, when tasks should be preempted. One of the split tasks is executedbefore Ta and the other split task is executed after Tb.During the time span, Ta, Tb) the non-split tasks arescheduled according to EDF. If a non-split task finishes its execution duringthis time span, then the next non-split task with the earliest deadline isselected for execution.PERFORMANCEMETRICSCPUUTILIZATIONIt refers to the proportion of the processor cycles,which each process consumes. It can be derived by dividing the sum of theexecution time of the each process executed in the core divided by the totalsimulation time.TASKPREEMPTIONAnoperation that interrupts the currently executing task, and assigns theprocessor to a higher priority task which is ready to execute, this is known astask preemption. The suspended task willbe inserted into the ready queue whichcan be resumed later.
TASKMIGRATION The tasks can be preempted fromone processor and its execution can be resumed in some other processor, this isknown as task migration.AVERAGERESPONSE TIME The response time of each taskis the time spent by the task from its release time till completion. That is,sum of the execution time of the task and the waiting time of the task in theready queue. Average response time is sum of response time of the tasks whichare scheduled divided by the total number of tasks being scheduled.DEADLINEMISS