Open side-bar Menu
 Core Values
Neil Parris
Neil Parris
Neil Parris is CCI Product Manager for the ARM processor division responsible for interconnect products.

How to solve the IP integration problem with David Murray

March 10th, 2015 by Neil Parris

Hello all I’m Neil Parris, a senior product manager at ARM. I’ll be blogging from time to time about certain issues surrounding EDA and IP integration in particular. I hope to provide some valuable insight into the sometimes murky world of SoC development. Please enjoy the content and don’t hesitate to leave comments or ask questions. My blogging debut on this platform comes in the form of an interview, as I sat down and chatted to David Murray. If you don’t know him, David joined ARM last year as part of the Duolog acquisition and is working as an IP Tooling Architect. He is incredibly enthusiastic and articulate so it is always a pleasure to speak with him.

Hi David, what’s going on in the IP integration space?

Well – IP integration continues to be a key challenge in SoC development. We’ve seen consistent increases in IP reuse, IP configurability and system complexity within tightly bound schedules compound the problem of IP integration. The number of IP in  a system continues to grow, the complexity and configurability of that IP itself is growing and the overall integration scope is growing as it affects more and more teams from front-end to backend  e.g. software, RTL design, verification, physical implementation etc.

This is a problem area that we are very familiar with and have been architecting integration solutions over the last number of years.   One of the fundamental pillars of improving the IP integration process is the standardization of the data in the process. We need to standardize our IP data (particularly the interfaces) through the use of metadata. If an IP can communicate its interfaces in a standard way then that makes the whole IP and SoC integration process a lot easier.  If we can have a formal definition (in some metadata format) of all the interfaces of an IP then we can use more automated intelligence about how it should be hooked up and enable other crucial flows.  For example being able to identify AMBA interfaces, clocks, resets, interrupts, DMA, debug and trace interfaces etc.  Also, it’s not just the hardware interfaces I’m talking about, it’s equally important to have a good view of the hardware/software interfaces like the registers and memory maps within the IP.

So how is this interface information standardized?

Well, for me, the obvious first thing is to make it so that the IP actually uses industry standard protocols as much as possible such as AMBA (ACE, AXI, AHB, APB, etc).  These interfaces are quite configurable so it’s important to be able to define their content and configuration in a metadata format. The main standard that the industry uses is the IP-XACT format, originally developed under the SPIRIT consortium but now developed under Accellera.  This essentially specifies a definition in a machine-readable (XML) format that can describe the IP interfaces and memory maps as well as its contents and connectivity. We are currently working within ARM to increase the standardization of ARM IP so it will be easier to integrate.

As long as a design flow creates this IP-XACT then we can work from there and run queries on that IP-XACT. Because we know what the tool reads and interprets, we can work together with partners to help them define the necessary IP-XACT specs.

Fast IP integration requires standardized IP

Fast IP integration requires standardized IP

That sounds great but how does it work with lots of different 3rd party IP or a partner’s internal IP?

ARM also produces IP-XACT standard bus definitions that can be downloaded from the ARM Website for anybody to use.  If other IP providers use these standard definitions then it will provide a much easier mechanism of connecting these IP in a sub-system or top-level.  Also, don’t forget that this is not just enabling more efficient integration – our partners will also benefit from the provision of better EDA solutions that can leverage this metadata.

So there’s a lot work being done in ARM at the moment?

This is something we’ve been working towards for the past 6 or 7 years, even within Duolog, because there is huge potential for reducing bugs and streamlining design and verification processes.  At the moment we’re very focused on increasing the level of standardization within ARM IP and even have an internal IP-XACT modelling definition group. We’re creating new bus definitions, new extensions and guidelines on IP-XACT usage and of course we’re leveraging ARM Socrates to create better IP-XACT flows. Now that we’re part of ARM we have a sub-system and SoC-level perspective so we’ve become avid consumers of IP-XACT which gives us a good feel for what our partners are experiencing. The main challenges that we face are probably fairly common in the industry. We’re trying to standardize all of the interfaces that we need in metadata format but firstly we need to understand all of the different stakeholders.

This is great because once we have standardized IP interfaces it makes the integration and verifications process significantly faster. However while you can standardize most interfaces with a relative minimum of fuss, we’re seeing a lot of IP blocks these days that can be tweaked in many different ways.  In some ways we see IP configurability as the biggest integration challenge.

The level of IP configuration that is available these days poses a problem with integration, because you can take the same IP block and configure them in different ways, and they will look and act totally differently based on this. So that makes it more difficult to then integrate successfully into a system.

I liken this to a mixing desk that you would have in a recording studio with hundreds of switches that can be turned one way or another to affect performance. The options enable designers to optimize their IP, but the amount of choice can also be confusing. What the user really wants is to be presented with the best configuration options for that particular IP block that represents the system constraints.

Recording Studio - Sanjay

Multiple configuration options can often leave designers confused

So how are you tacking this configuration problem?

When we talk about IP configuration in general, there are three different types of configuration levels that an IP block can have. First off you have what is called static IP which cannot be configured at all. This was what you would call off the shelf IP that was more common in the past, where you would purchase it for a plug-and-play type functionality. Nowadays even off-the-shelf IP requires a bit of user configuration according to each individual design.

The second type of configurable IP is a simplified version that has a fixed set of parameters that can be set. Having said that – it can be a challenge creating a configurable IP because let’s say for example you have 10 or even 20 parameters, the amount of possibilities makes it difficult to guarantee that your IP will work for every single configuration. Validation teams and modelling will ensure that the IP works fine for the most probable scenarios, but it’s hard to test for everything. You only have a finite amount of verification resources to ensure that it is all tested rigorously.

The third type of configurable IP is heavily dependent on the system for its configuration, an example of these would be system interconnects, debug and trace subsystems, power, clock & reset, interrupts, I/O, memory systems. They are super configurable and the amount of permutations means you need a different type of strategy to properly handle these. Ideally you would have some form of highly intelligent solutions that can interpret the system requirements and interfaces so that users can easily configure these types of IP.  These are the challenges that we have been working through and steering the Socrates design environment into providing solutions in this area.

ARM are already providing a lot of IP in this area including bus interconnection IP such as ARM CoreLink NIC-400,  ARM CoreLink CCI-400 Cache Coherent Interconnect, and ARM CoreLink CCN-512 Cache Coherent Network as well as ARM CoreLink GIC-500 and also CoreSight Debug and Trace IP. These IP will consume vast amounts of system connectivity e.g. a cascaded interconnect infrastructure and CoreSight Debug and Trace could consume upwards of 50% of a systems connectivity, so in some ways highly configurable IPs are one of the pillars to solving the integration solution.

The new key ingredient that we are bringing to the table is to help manage the configuration of these IP so that it is aligned with its system context. If we can understand the contents of a system and its different interface requirements we can help to guide the configuration of IP.

How do we do this? – By having all system components in a metadata format, of course, and to have intelligent flows that can extract this information and perform this guided configuration – really it’s intelligent IP Configuration.

So this is how the IP integration problem can be solved?

Yes – The vision that we have been working towards with the ARM Socrates IP Tooling for the last number of years has been to create a ‘System in a Day’ by creating an intelligent IP configuration capability.  Back when we first released Socrates, over 6 years ago, the integration task was taking people many months to get an initial RTL netlist and several more months thereafter to get a viable system up and running.  With Socrates we began making significant reductions to that schedule, bringing it down to several weeks.  We saw however that each piece of IP was designed, built and integrated independently of each other. So for example the interconnect was built from a specification, and then people attempted to integrate it into the system from the same (probably outdated) specification.  The bottleneck of the ‘System in a Day’ was the creation and integration of these system-dependent IP. The solution that we centred in on was to seek an intelligent way of configuring these IP … within the context of their system. We are arriving at a solution to the IP integration problem through intelligent configuration of the IP itself. I believe that configuring every aspect of the system correctly is a highly effective way of increasing its overall connectivity.

What we’re trying to do here is use the metadata to give a fast, correct configuration in a system context. What I mean by system context is that you can see how different system requirements have a knock-on effect on the configuration of each IP and the system as a whole. What that allows us to do is reduce the time that’s spent on actually integrating the parts into a system because 90% of that work will have been done through intelligent configuration. In order to realize our ‘System in a Day’ vision for IP integration we need to do it through intelligent configuration. You need to have a solution for these complex IP blocks so that they can reconfigure themselves as the system is being defined.

We’ve seen partners say that even just understanding the perspective of some of the more complex IP blocks within the system normally takes them several weeks to compile. In the past they have had to go through the TRMs and specs to understand what is required for the system.  We want to be able to provide this information instantly from the metadata of the IP in the system.

The IP integration problem will be solved through intelligent configuration

The IP integration problem will be solved through intelligent configuration

Related posts:

3 Responses to “How to solve the IP integration problem with David Murray”

  1. Shine Chung says:

    The first step is to standardize all interfaces among the “soft IPs”. The next step is to do the same for “hard IPs”. We developed state-of-art OTP IPs that are purely based on logic flow, which means we don’t have high voltage, repetitively read-verified, redundancy, and extremely high reliability as logic devices. OTP IPs can be used to repair caches for embedded processors.

    Shine Chung, Chairman of Attopsemi

  2. Rus Talisin says:

    Hard IP must define a timing jitter tolerance , a further problem in what context this definition exists.
    .. i challenge any Hard IP to solve real world problems .. i have a perfect project ..

    Reality of IP Wars .. everyone has a containerized solution , without timing for various contexts .. lets face it that is the main event ..1. where is the data ? 2. Applied Algorithm 3 how is it accessed after processing ?

    Does it stream ? How to prepare data to reduce access times ? What maths libraries ? How to optimise ?

    Inspiring to read kernel timing resolutions dowm to 1uS in CUDA .. a tolerance of 10 ~ 20 uS to “RT” ENTRY ..& EXIT routines .. may just be possible .. but NOT with ANY link to an underlying HOST OS.

    However the dismal (at least so far ) reading orients users to royalty IPs more-so than VHDL scratch built functions. there in there somewhere ..

    With a most annoying FILE based function & Kernel respository .. hard to divorce from HOST OS .. (PCIe transaction / port standards / et al .. ) dont need PCIe runtime .. even tho our board is a PCIe x 8 .. only enabled to the screen , once a focal point is found ..

    IT is most annoying that simulation is done on HOST rather than on the FPGA fabric .. based on Intel compiler maturity for libraries / node timing / et al .. this only shows a dated tool set , based on EVENT processing / of course all abstracted language is EVENT based ..

    areas of CUDA most interesting is the WARP data exchange , which is thread to thread ..& the NPP libraries .. there are many Image block features which promise updated data exposures when the video is replaced by relevant data ..

    4.3 Region-of-Interest (ROI)
    In practice processing a rectangular sub-region of an image is often more common than processing complete images. The vast majority of NPP’s image-processing primitives allow for processing of such sub regions also referred to as regions-of-interest or ROIs.

    All primitives supporting ROI processing are marked by a “R” in their name suffix. In most cases the ROI is passed as a single NppiSize struct, which provides the with and height of the ROI. The “start pixel” of the ROI is implicitly given by the image-data pointer.

    .. overhead on Kepler ? .. Implied addressing is an old ASSEM .. what overhead is added in Kepler and can 90 independent frames be indexed and updated by incoming data stream (formatted according to requirement) , each new frame set ( updated 10mS ) indexed from onchip cache ?

    Again CUDA seems transfixed to an underlying HOST OS , which for my apps require separation .. The OS is an UGLY mess .. only useful to display results ( and then only in convenient page formats , or at best VIA plugins / importing thru MATLab

    So from the onset .. our boards have dedicated DMA drivers not using the slow PCIe .. in case one asks how the data gets into the 16GB FPGA RAM buffer .. Full Development Software / Hardware supplied . Shine Chung is welcome to review ..

  3. Rus Talisin says:

    ” //.. most annoying that simulation is done on HOST rather than on the FPGA fabric // ” (to explain better :

    of course simulation must begin on the HOST, via file based IP kernel timing to build up the algorithm.

    what is missing AFTER the fabric is complete is a code trimming phase of retiming with FPGA in the loop.

    MATLAB is a HOSTED toolset , i am wondering if logic can be fully customized using VHDL & MATLAB using the FrameWork Logic toolset. Normally, MATLAB is a HOST file based or HOST i/O “engine”
    where “engine” is the software environment and computation libraries that MATLAB provides.

    As they are HOST oriented resource consuming memory structures ( malloc ) and must interoperate with OS kernel supervision , shared with the PC it becomes impossible to isolate FPGA design from these HOST ops.

    These sales statements hardly explain dependency : BSP is written by the board vendor, so has nothing to do with MATLAB (unless it conforms to a MATLAB SDK standard for i/O)

    MATLAB BSP supports real-time hardware-in-the loop development using the graphical, block diagram Simulink environment with Xilinx System Generator.

    Theres no reference to the overheads involved or dependencies created by either FrameWork or Xilinx SG

    Why are these issues sidestepped .. is it the “abstraction training” that draws blanks or other reluctance?

Leave a Reply

Your email address will not be published. Required fields are marked *


TrueCircuits: IoTPLL

Internet Business Systems © 2018 Internet Business Systems, Inc.
25 North 14th Steet, Suite 710, San Jose, CA 95112
+1 (408) 882-6554 — Contact Us, or visit our other sites:
TechJobsCafe - Technical Jobs and Resumes EDACafe - Electronic Design Automation GISCafe - Geographical Information Services  MCADCafe - Mechanical Design and Engineering ShareCG - Share Computer Graphic (CG) Animation, 3D Art and 3D Models
  Privacy PolicyAdvertise