Performance Evaluation of Real-Time Scheduling
Algorithms for Multicore Electronic Control Units in Automotive Applications


We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

Internal Combustion Engines (ICE) used in
automobiles is responsible for the pollution because of their emission of Green
House Gases (GHG) into the environment. The number of automobiles has been
increasing with increase in population which led to the increase in pollution. To
control these, many auto emission laws were introduced by government. In spite
of all these effort, use of ICE’s continued to be the cause of environmental
pollution. This fact combined with the advancements in the technology has become
the reason for evolution of Electric Vehicles (EV) and Hybrid Electric Vehicles
(HEV). These categories of vehicles have reduced environmental pollution and
the dependency on fossil fuels for automobiles. EV uses electric power from the
battery as the major power source. However, introduction of EV involves many
challenges due to the limitations of performance and difference in technologies
being used. These challenges have become the motivation for the research of
improving the technologies used and performance in EVs.

Electric Control Units (ECU) are the embedded
systems used in automobiles to control one or more electronic subsystems of the
vehicle. The major components of an ECU includes a microcontroller which acts a
core, memory for storing the control software, sensors for getting inputs, actuators
for outputs, and communication links for connecting to other ECUs. An ECU gets
the input (can be analog or digital) from the different sensors and computes
the output (driver or logic outputs) using the stored software. There are
different ECUs for the different systems in the automotive; few examples
include Body Control Module (BCM), Engine Control Module (ECM), Power Train
Control Module (PCM), and Vehicle Control Module (VCM). Thus, the requirement
of the modern automobiles to be safe, efficient and, comfortable is fulfilled
by use of ECUs. But the customer needs more sophistication in terms of comfort,
safety and infotainment, which increases the complexity of the software. This
leads to the need of highly computational microcontrollers and also increases
the number of ECUs being used. An Average ECU consumes 200mA which is around
2.4W power; hence with increase in processing power and number of ECUs, the
power consumption is increased. Thus, the main objective in development of
electric vehicles is to save energy whenever possible.


In order to overcome the above issues, multiple
existing control systems are integrated into a single ECU with multiple cores.
Since the software which were running on multiple ECUs are integrated to run on
multiple cores on single ECU, provides higher performance. Also the number of
communication interfaces required to connect ECUs of different control systems
get reduced and so the traffic on the communication network. That is,
introduction of multicore reduces the complexity of In-vehicle architecture. Applications
such as in engine controllers or infotainment may require higher performance.
Without adding any additional hardware feature, these requirements can be met
by parallelizing jobs on multiple cores. To improve the safety, the safety
critical applications can be run in redundant manner on multiple cores, and the
output given by the majority of the cores is chosen to be the final output. In
addition to these advantages, the number of ECUs gets reduced which in turn
reduces the power consumption due to ECU.


                Most of the systems in an
automobile are electronic control systems with microcontrollers to for
efficient working. Thus, in an automobile, 30-40 percent of the total value is
contributed by the embedded system and software. The complexity of the software
depends on the criticality of the functionality being carried out. Thus, either
increase number of ECUs or increase in performance requirement of the ECU
increases the complexity of the software.  This in turn increases the power consumed by
ECU for executing the programs. This also requires the development of
centralized hardware and hierarchical ECU architecture that runs software which
is hardware independent. However, use of multicore ECUs with parallel
execution can reduce the software complexity thereby reducing the power
consumption.  This in-turn requires the
concept of operating systems and middleware


                Scheduler is the part of
operating system that decides which tasks should be executed next in a
processor. The decisions are made based on some policies which is termed as
scheduling algorithm. Thus, the applications running on multiple cores share resources,
which require an efficient scheduler to schedule the tasks. This ensures the
logical correctness of the results of the tasks, prevention of shared resources
from corrupting, and helps in effective utilization of the cores without
leaving any core to be idle. Without an optimal schedule, the average response
time of the tasks or the logical correctness of the results may vary. This may
lead to performance degradation or safety hazards.


In a multi-tasking environment, when there are more
than task is ready to get executed, the operating system (OS) must decide when
a task should be allotted to the CPU and also the duration for which it can be
executed in CPU. This decision is made by the part of OS named as scheduler using scheduling algorithms, the set of rules and policies used to
allocate the tasks to the CPU. The basic idea is to create a task execution
order keeping the CPU busy, and to allocate the resources shared by the tasks
effectively. In a real-time system, every time an event occurs, a task gets
generated to handle the event. Usually, every real-time system consists of
number of real-time tasks. These tasks must execute and produce the results
before a particular time, which is referred as deadline. Every task has different time bounds, and the consequence
of a task missing its deadline also differs from task to task. Thus, the
objective of real-time scheduling algorithms is to schedule tasks satisfying
the timing constraints.


The scheduler need not run continuously, it is
invoked by the operating system only during the scheduling points to make the scheduling decisions. In uniprocessor
environment, there is a single shared processor in which all the tasks are
executed. In a multiprocessor environment, the tasks can be executed in several
available processors. Based on the number of processor available, the
scheduling algorithms are divided into uniprocessor and multiprocessor
scheduling algorithms.


 Based on the
scheduling points, the scheduling algorithms are classified as clock-driven,
event-driven and hybrid. In clock-driven schedulers, the scheduling points are the interrupts
from the clock source, e.g. Table-driven and cyclic scheduling. These
algorithms do not assign priorities to the tasks; instead it makes use of the
table which the order and time for execution of tasks based on the interrupt
from clock. These type of schedulers are easy to implement and efficient, hence
used in embedded applications. These kinds of schedulers are referred as offline schedulers, since the scheduling
decisions are made before the execution of the application.

In event-driven
schedulers, the scheduling points are the occurrence of events like arrival of
a new task, completion of a task, interrupts or a task being preempted. A
scheduler can be preemptive or non-preemptive. A preemptive scheduler
allows a task to get suspended while executing and start the execution of other
task based on the priority. Non-preemptive scheduler does not allow the
suspension of tasks once it starts executing. The scheduler can use for static priority or dynamic priority while taking scheduling decisions. In static
priority, the priorities if the task is set before execution and it remains
fixed for all the instances of the task. Rate Monotonic (RM) algorithm, the
priority of the tasks is based on its period. Deadline monotonic (DM)
algorithm, the tasks are prioritized based on the deadline. In case of dynamic
priority, the priorities of the tasks are computed during execution and vary
for various instances of the same task. Earliest Deadline First (EDF) algorithm,
the task with shorter deadline is given higher priority. Least Laxity First (LLF)
algorithm, the task with minimum laxity is given higher priority.

 In hybrid schedulers, the scheduling points
can be both clock interrupts and events. Round robin scheduling algorithm, the
ready tasks are held in a circular queue and taken up in sequence. The tasks
run for fixed time intervals called time slices. If the task is not completed
with the allocated time slot, it is inserted again in the ready queue. Event-driven
and hybrid schedulers are referred as online
schedulers, since the scheduling decisions are made during the execution of the
application based on the current state of the system and tasks.


                Multiprocessor scheduling
algorithms are divided into three categories as global scheduling, partitioned scheduling and hybrid scheduling. Among these, uniprocessor algorithms can be
extended to be used in a multiprocessor environment under global and
partitioned algorithms.

Global Scheduling Algorithm

                In global scheduling approach, all
the tasks that are ready for execution are stored in a global queue which is
shared by all the processors as shown in Fig. 1. If there are ‘n’ processors,
the global scheduler will select ‘n’ high priority tasks from the queue for

Fig. 1 Global Scheduling

tasks can be preempted from one processor and its execution can be resumed in
some other processor, this is known as task
migration. Since there is no constraint on restricting the execution of
task in a particular processor, global scheduling approach provides good workload
balance among the processors which may prevent tasks from missing deadline. It
lowers the average response time for the tasks, and simple to implement. But
the task migration leads to overheads. Global-EDF, global-RM, and global-DM
algorithms, the ‘n’ active high priority tasks will be selected for execution.
Other examples are Proportionate fairness (PFair), Deadline Partitioning
fairness (DP-Fair) algorithms which provides fluid schedules for the tasks.

Partitioned Scheduling Algorithm

                In partitioned scheduling
algorithm, task set is divided among the processors and a local scheduling
algorithm is used for each processor to schedule the tasks assigned to that
processor. This approach reduces the multiprocessor scheduling problem into ‘n’
uniprocessor problems as shown in Fig.2. That is, the tasks are statically
assigned to the processors and the tasks are scheduled using uniprocessor
algorithms like RM, EDF, LLF, etc. Different uniprocessor algorithm can be used
for each processor core after the tasks are partitioned; this provides
isolation to different cores in the systems. But task migration is restricted
and each task is executed only in the assigned processor. Hence, there are no
overheads due the migration of tasks between the processors which is advantage
over the global scheduling approach. Hence this category of scheduling
algorithms is suggested by AUTOSAR, standardized software architecture for automotive.

                Task partitioning among the
processors is NP-complete problem. Hence, the performance of partitioned
algorithms is limited by the performance of the bin-packing heuristic being followed. Bin-packing algorithms try to
pack the set of objects in the available set of bins. Here, the bins are
represented by the processors and the objects by tasks. The utilization of the
processor is used to check if the processor is full or can take tasks further. So,
they try to allocate tasks to each processor such that the utilization of the
processor is 1. If task is allocated to a processor, it is allocated as a
whole; that is, it cannot be split among different processors.  In a system with ‘n’ processors, the bin-packing
algorithm cannot guarantee successful partitioning of the tasks if the total
utilization of the tasks is greater than (n+1)/2. Thus, in worst-case, only 50%
capacity of the system will be utilized. The unused capacity cannot be
exploited, which is on the disadvantages of this category of scheduling

Fig. 2 Partitioned Scheduling

                Some of the bin-packing
approaches that are widely used are First
Fit (FF), Next Fit (NF), Best Fit (BF), and Worst Fit (WF). A task is said to fit in a processor, if the processor
utilization remains less than 1 after that task has been allocated to it. In
FF, the tasks are assigned to the first available processor in which the task
can be fit. In NF, the tasks are tried to be allocated the processor to which
the last task was assigned. If it cannot be fit in that processor, it will be
tried to fit in next processor. In WF, the task is fit into the processor which
has the least utilization. In BF, it is the reverse of WF, the tasks are tried
to fit in the processor which has maximum utilization. If it cannot fit in that
processor, then the task is tried to fit in the processor which has next highest
maximum utilization and so on. Other approaches would be, Decreased FF,
Decreased NF, Decreased BF, and Decreased WF. In each of these decreased versions
of bin-packing algorithms, the tasks are arranged in decreasing order of their
utilization and one of the methods is applied for partitioning the tasks. Rate Monotonic
First Fit (RMFF), in which the FF is the algorithm for task partitioning and RM
is the local scheduler algorithm.

Semi-Partitioned Scheduling

                Like partitioned scheduling
algorithms, semi-partitioned algorithms also requires an off-line task
partitioning algorithm and a run-time scheduler to dispatch the tasks. As in
partitioned algorithm, the tasks are executed in on processor; but when the
task cannot be fit into any individual processor, it is split and allowed to
migrate between the processors as shown in Fig.3. The tasks migrate among the
processor in such a way, neither they do not migrate back to the same processor
in the same period nor they get executed in more than one processor at the same

The main idea of this type of algorithms is to
globally schedule the tasks that cannot be assigned to only one processor
because of the limitations of the bin-packing heuristics, and to improve the
utilization bound of the partitioned scheduling. Since task migration is
limited, the overhead is less and increases the system utilization, which were the
disadvantages of global and partitioning algorithms.

Fig. 3Semi-Partitioned Scheduling


                The Largest Local Remaining Execution First (LLREF) is a DP-Fair
algorithm, which is optimal for scheduling periodic tasks. It depends on the abstraction
called Time and Local Execution Time
Domain plane (T-L Plane). The T-L planes are repeated over time, and a
scheduling algorithm that can schedule tasks in single T-L plane can schedule
tasks in all repeated planes. Fig. 4 shows the Kth  T-L Plane.

                The status of each task is
represented in the form of token in
T-L plane. The location of the token describes the value of current time along
the horizontal axis and value of the remaining execution time of the task along
the vertical axis. The local remaining
execution time, rik(t) of the task Ti does
not mean the deadline, it represents the time of the task that must be executed
within this T-L plane. In every T-L plane, the task Ti must execute
for its local execution time li
which is equal to its utilization Ui multiplied by the length
of the plane. When the time slice begins, in a system with ‘n’ processors, ‘n’
tasks with largest local execution time will be selected for execution. The
tokens are capable of moving in two directions; when a task is selected for execution,
it moves downwards diagonally as TN in Fig. 4, otherwise it moves
vertically along x-axis as T1 in Fig. 4.


Fig. 4 Kth 
T-L Plane

                The scheduling objective in Kth  T-L Plane is that all the tokens
representing the tasks moves towards the rightmost vertex of the T-L plane;
i.e., tfk. If all the tokens in the plane successfully
reach the vertex, it is said the tasks are locally
feasible. If tasks are locally feasible, it will be scheduled in the
consecutive T-L planes. Local Laxity
of task Ti is given as tfk  – tcur – li, where
current time is represented by tcur. The oblique side of the T-L
plane is represented as no local laxity
diagonal (NLLD). Whenever a token reaches the NLLD, it means the laxity of
the task has reached zero and it must be selected immediately for execution.
Otherwise the tasks cannot achieve the local feasibility. There are two time
instants when the scheduling decision has to be made in the T-L plane. Bottom hitting event (B event), when the
local remaining execution time of the tasks reaches zero as TN in
Fig. 4, so the task with next largest remaining execution time at this time
instant can be selected for further execution. Ceiling hitting event (C event), when the token hits the NLLD as T1
in Fig. 4, the task must be selected immediately for execution. The token of
the tasks with zero local remaining execution time are said to be inactive, others are said to be active.

Fig. 5 T-L plane for tasks T1, T2,
and T3

                Consider an example, with 3
tasks T1, T2, and T3 to be executed on 2
processors as shown in Fig. 5. Initially T1, T2 are
selected for execution in processors ?1 and ?2, it can be
seen that the corresponding tokens move diagonally downwards. Also the token
representing T3 moves vertically. After some time, C event occurs
because of T3, hence T2 is preempted and T3
is executed on ?2. Now, the token representing task T2
starts moving vertically and token of task T3 starts moving
downwards, while the token of task T1 keeps moving downwards. Then
when B event occurs as T1 gets completed, again T2 is
selected for execution.


                EKG stands for EDF with task
splitting and k processors in a Group. EDF with task splitting and k processors
in a Group (EKG) is also an optimal algorithm for periodic tasks. The tasks are
assigned to the processors in such a way utilization of the processor does not
exceed 100%. The designer should select a parameter ‘k’, the value of which
should be 1 < k < n; n is the number of processors in the system. The algorithm separates the tasks into light and heavy tasks based on the value of a separator.  The value of the separator, SEP is given as (k/k+1) if k < n or 1 if k = n. The task is classified as heavy if Ci/Ti > SEP, else it is classified as light task. The heavy tasks
are assigned to the dedicated processors, thus whenever a heavy task arrives,
it is executed on the assigned processor. The light tasks are assigned to a
group of processors, with at most k processors in the group. The tasks are
considered one by one, current processor is denoted by index p and the current
task is denoted by index i. The main aim of the algorithm is to assign the task
that is currently considered to current processor p, if the schedulability condition
for EDF is true. Else if the condition is false, the task is split into two
portions and assigned to the current processor p and processor p + 1. Then the
processor with next highest index is considered for the rest of the tasks. This
is similar to the next-fit bin packing algorithm, but the algorithm permits
splitting of tasks if the task cannot be assigned to a processor. Whenever a
task arrives in that group of processors, all processors in that group executes
the dispatcher to select the tasks for execution. Let T0 denote
the time when a task arrives and T1 denote the time when any
other task in that group arrives; then two time instants are calculated, Ta
and Tb, when tasks should be preempted. One of the split tasks is executed
before Ta and the other split task is executed after Tb.
During the time span, Ta, Tb) the non-split tasks are
scheduled according to EDF. If a non-split task finishes its execution during
this time span, then the next non-split task with the earliest deadline is
selected for execution.



It refers to the proportion of the processor cycles,
which each process consumes. It can be derived by dividing the sum of the
execution time of the each process executed in the core divided by the total
simulation time.


operation that interrupts the currently executing task, and assigns the
processor to a higher priority task which is ready to execute, this is known as
task preemption. The suspended task will
be inserted into the ready queue which
can be resumed later.



                The tasks can be preempted from
one processor and its execution can be resumed in some other processor, this is
known as task migration.


                The response time of each task
is the time spent by the task from its release time till completion. That is,
sum of the execution time of the task and the waiting time of the task in the
ready queue. Average response time is sum of response time of the tasks which
are scheduled divided by the total number of tasks being scheduled.