LINQ to SQL not Suitable for LOB

In my previous post I mentioned that some post or blogs on the Internet are hugely misleading about the available technologies, since they tend to hide basic facts and focus on the superficial magic, which sometimes simply don’t matter when the technology is unusable.

In the company I work for, we rejected LINQ to SQL about a month ago, after trying to solve its biggest problem that of performance combined with thread safe when cashing the Data Context. So any info I have found from various sources are not available.

What is LINQ?

LINQ is basically a collection of extension methods to any Enumerable object.

What is LINQ to SQL?

LINQ to SQL is LINQ over the enumerable objects created while drag and dropping tables and procedures in a dbml file. This classes are known as Entities and the object that manages them as a DataContext.

My Sin

I really don’t like SQL. It reminds me of procedural programming which I stopped writing a decade ago. So any technology that will allow me not write SQL is more than welcome for me. So when I first started reading about LINQ to SQL, I started thinking at last an ORM from Microsoft herself.

My Sin was that in spite of the objections of our more experienced programmer, I was standing by LINQ to SQL, mainly using arguments that came from the notion that everything was great in LINQ to SQL. This notion existed because I believed the posts on the Internet.

One of the biggest objections was that of security. It was unacceptable for him, for an application to have access to the tables of a database. My lack of SQL knowledge, didn’t take that in mind, so as then, supposedly there is no security objection for the rest of the post.

First Impressions

At first I was really disappointed because there was no support for DataSet manipulation through LINQ to SQL. At first I hadn’t realized what LINQ is exactly, so I implemented a library that did this job. When building N-Tier applications, Typed DataSets are the most effective solution for the business model. I wrote about it here. Having wrote this library I was really convinced that we had a great tool for LINQ to SQL, and that it was the choice for our Data Access Layer.

Problems arising

Having spent time to build this library, it now came to check whether LINQ to SQL was the valid choice for out Application. So we created a huge table in SQL Server and started case testing and comparing with know DataSet methodology.

A Data Access Layer will be used by a Web or a Server Side of an application. In order to be thread safe, you need to create the Data Context with each call. But this is slow when data are huge, and there were some posts saying that there would be a way to cache the data context through a configuration setting. So I automatically assumed that caching and thread safe has been taken into account.

But that wasn’t the case. The sad truth is that, if you cache the Data Context you must create and maintain one for every thread or Http Context in order to make them thread safe, with a trick I read from someone else. Practically you don’t make them thread safe, but thread specific.

With caching, performance improved greatly. In some cases it was quicker than DataSet methodology. The main reason was that, while the queries run, the Data Context of each thread kept its entities in memory so no queries to server where required after a number of iterations. After this was noticed I immediately thought what about memory consumption in the server as the execution time passed. How the hell do you manage this side effect.

Conclution

LINQ to SQL might be very appealing when reading about simple objects and simple applications, but when the application gets big the coordination problems that might occur and produce data corruption, are clearly the death tomb of LINQ to SQL for LOB. And the sad part was that, I fell victim to all those glorious posts. I believe this is a risk that no company should ever take. Data integrity is something that one must never mess with. This includes the security objection mentioned above.

LINQ to SQL is a proof of concept. It is one of the things that in IT theory look great but when put in practice, it proves once again that theory sometimes doesn’t relate to practice.

Advertisements

19 comments

  1. Pingback: WCF Debugging and a WCF review « Alex Sarafian as Developer

  2. First, you can use stored procedures in LINQ to SQL, and I highly recommend you do. You will have much greater security, and won’t have the problem of the application needing direct access to the tables.

    Second, you CAN cache the results of the data context. When using stored procedures for LINQ to SQL, each procedure created in the designer will have a class it returns, with the name of the stored procedure followed by the suffix “Results”. For example, if you stored procedure is called “sproc_Users_Get”, the class created by the designer would be called “sproc_Users_GetResults”.

    In your code, when you need to cache the data, would do something like this:
    MyConnDataContext mydb = new MyConnDataContext();

    List ListToCache = mydb.sproc_Users_Get().ToList();

    Cache[CacheName] = ListToCache;

    Notice that I used the extension ToList() after the call to the stored procedure. ToList() does two things for you. First, it forces immediate execution of the sproc, and second it converts the results into a list of the underlying class type (in this case, sproc_Users_GetResults).

  3. Philip thanks for your comments.

    First of all.

    I knew about Stored Procedures but an ORM or an want to be ORM library should not require me to write SQL code at all. In a few cases , as mass updates then so be it, but generally what is the point of an ORM if I have to write Stored Procedures? As I mentioned in my article supposevly there is no security issue.

    Second

    I hadn’t seen your post about caching technics. I still believe that along execution time, each cached Data Context will grow in memory size and most importantly of all, they will have same data from the database in different instaces. What is the point of having a database when practicall all your data soon or later gets to be resided in memory, and in our case in multiplied.

    Let me tell you again. I was very fond of LINQ when I hit this obstacle. I could overlook the security issue, but data incosistency is something that no one takes a gumble with for obvious reasons.

  4. It appears you have completely forgotten Lazy Loading. With Linq to SQL you aren’t loading HUGE amounts of data, you are only loading “just enough” and the rest is loaded as you need it (and not at all if you don’t).

    Secondly – if you are loading enough data that you are concerned about the memory usage of your cache (it must be fairly large for this to be a real concern) you should probably consider pulling out the data in “pages” – for example retieve only the first 20 records and then retrieve the next 20 when needed and so on.

  5. FDumloa your comment is true based on the assumption that you are permantly connected to the database.

    In LOB that is not true, so you must load all the data you need and then deal with it. Even in the case of lazy loading ypu describe, as time passes the data context holds more and more data if you cache it.

    Who decides which entity to unload and which not?

  6. The DataContext does not contain a permanant connection to the database. It connects on demand (using the connection pool) when the data is needed. On persist methods (INSERT and UPDATE) Linq to SQL actually validates the data before sending it to SQL Servier too which avoids needless connections when you would just get a failure.

    You also do not need to make any tiers outside of your Data Access Layer aware of the concept of a DataContext because when you use a DataContext to retrieve a Linq to SQL entitiy it comes with built in understanding of the datacontext it is attached to (by means of a dynamically created proxy). This gives your entities the ability to retrieve data lazily while maintaining transparency and not exposing your data context to every tier of your application.

    In Hibernate you do have to either load all the data up front OR expose your Session to any tier that would need it – this is not the case with Linq to SQL.

  7. FDumlao I didn’t say that there is a permanent connection open.
    But for everything to work, all entities must be within a datacontext, because when they are not, re attaching them is a hard thing to do.

    The thing you say with the proxy I didn’t quite understand. Let me explain to you a case senarion and tell me whether LINQ can be used.

    I have a Client Server application. Services of the Server are behind an IIS. In a single action such as Load, I do not need a single entity, but a collection of them and other collections related to them. This is done perfectly by DataSet.

    How can I have these entities serialized to the Client, and use them as like they were in the serverside. How will they know the DataContext they are within? Am I missing something ? From what I have read, there was not any solution besides, serializing the entities and then reattaching them to the context. For this case I had written https://sarafianalex.wordpress.com/2008/04/21/typed-dataset-linq-entities/ , but then we stumble upon all the thread safe issues.

    I also did a simple benchmarking, and when the number of database rows where great (say a million and more), LINQ performed poorely in comparison to Data Access Layer and DataSets. At first I was connected to the database with wireless, and then did the same test locally which I saw an improvement, but still there was the thing with thread safe, if I choose to cache it.

  8. I see – so you want to be able to serialize your entity and send it to the client, have the client perform some manipulation on that entity and then send it back to the service layer to persist?

    I agree that that is definitely not the ideal case for Linq – however in one system we have built we are doing something similar.

    We provide a lightweight API using WCF and REST. This is used by client applications to retrieve entiteis and to update them – however the client does not provide us with an updated copy of the object in the update method. Instead they provide us with an ID for the entity they are updating as well as any fields they would like to update. At this time the entity is loaded on the server side, manipulated, and then persisted with the updated information.

    As for the fact that you are serializing the entity – if you are doing this in every case you should probably not use Lazy Loading. While Lasy Loading will definitely work… it is going to mean that every entity in your collection will hit the database for every single Lazy property in order to get what is needed for serialization. This works – but is definitley not recommended. If you now your data is destined for serialization, you should probably override this during your select.

    Linq to SQL may not be the exact right solution for your project – I certainly am not one who would say it’s the right solution for every project – because I honestly don’t believe that. It is important though that when you decide not to use a technology that you are absolutely certain that you are deciding not to use it for the right reasons. That way when future projects present themselves you will still have a good understanding of what is in your toolbelt.

    As for DataSets – they still have thier place and they are still the #1 method for Enterprise .NET developers to get access to data. If you are going to use datasets, don’t forget about Linq to Datasets which will make your life that much easier.

  9. When I did the research I had understood that ADO.NET Entity Framework is the base for LINQ to SQL. I mean entities and DataContext is the “to SQL” part, and LINQ is tha extention methods on entities collections.

    All you say is very interesting, but still I don’t believe in LINQ for LOB, either web or N-Tier. As much as I would like to use it, I really can’t see a way to trust data integrity with it. If I were ever to make a small application, I would deffinetelly use it, and If I chose to transfer I would use the library I mentioned.

    But as I said, for data sensitive applications, I wouldn’t take the risk. I would defenitelly use an ORM which doesn’t keep track of changes by means of DataContext class which is not made thread safe at least. I really can’t understand microsoft for this specific choice.

    If they had used the exact same mechanism(designer with drag and drop), but made the entities not dependented to a data context, we would have trouble free remoting, no need to thread safe and a rad mechanism for DAL, just like a pure ORM.

    Have a nice day. Here is midnight.

  10. It is likely you misunderstand the purpose of the DataContext. DataContext’s are cheap to construct. They keep change-tracking state so you’ll want a new one for each unit-of-work you process. You create DataContext instances and use them in the same way that you would use an ADO Connection object in your code today. You don’t keep them in global state to use across threads. Creating them will not largely impact your memory usage.

  11. Matt my first approach was to construct every time the Data Context.
    But we tried some queries in out database that has many records and compared the times with our previous data access layer which was dataset with stored procedure. The times were dissapointing. I can understand a mild perfomance hit. I believe every orm has one. Nothing comes good without a consequence. But the times were dissapointing.

    Then I tried to cache the DataContext and we say great perfonce improvent, even better than datasets. The main reason was that as the instance of the Datacontext remained alive, the entities were practically also cached so there was no database fectching again from the database. But these cached objectecs remain in memory, and this is the case when I’m talking about memory problems. The next step was to check about thread safe, and after some research we decided it was not worth the risk.

    I would really like to use LINQ, with DataSets connecting them with the library I created (which has a small performance pennalty from a little I tested it), but there were some serious objections form other stuff members about, Database security and data intergrity.

  12. A few things:

    0) FDumlao is right; most of these comments are really just distilled (and obnoxious) versions of things he said.

    1) It’s not a good idea to cache DataContexts, since that results in duplicated/inconsistent data. The behaviour of DataContext strongly implies that you just shouldn’t cache it (see 2a).

    2) As FDumlao says, when updating, just reload the objects via a new DataContext, update them, then commit. If performance is such a problem, then maybe the structure of the application needs a change?

    2a) Don’t forget that you can use DataLoadOptions on a DataContext to optimise the lazy loading behaviour. The DataContext can only have this set once (see 1).

    2b) It’s great that you’re measuring performance, but you should consider also checking what SQL LINQ is sending to the DB. Some queries or neglect of DataLoadOptions can cause LINQ to perform excessive DB accesses instead of just failing like straight SQL would have. (i.e. LINQ is “clever” enough to return the intuitively-correct results for a GROUP BY statement even when combined with TOP; unfortunately this results in O(n) for the query).

    3) I can’t imagine any reason to implement a typed dataset wrapper over the top of LINQ objects; for an N-tier model you should consider passing around custom Data Transfer Objects.

    3a) I don’t think DataSets are really the kind of Objects Microsoft were thinking of when they developed LINQ as an Object-Relational Mapping framework.

    4) Dismissing LINQ as unsuitable for (all?) LOB apps does a great disservice to its painless provision of ORM, intellisense, near-omnipresent querying, change detection, and compile-time checking of queries. These features have made writing data access logic more enjoyable and reliable than ever before.

    Of course there is some overhead to LINQ, and it’s not the best tool for every job, but it’s a great tool for many more jobs than stated by the original posting.

  13. I agree completely with your viewpoint. There is a very interesting product called DataObjects.NET (http://www.x-tensive.com) that takes a very different, object-first view of ORM. I’ve been using it for years at a number of different companies and have been very happy with both the ease of use and performance.

    When coding, one thinks in objects. The way the data persists should be transparent, and is truly deterministic once the object model is created. This product follows the same approach and I haven’t seen anyone else do so.

  14. Quote from Phil Factor from SQLServerCentral.com

    There was some hilarity amongst DBAs recently when Microsoft signaled the end for Linq to SQL (nee DLINQ) by confirming that Linq to Entities will be the recommended way to access relational data. One can sympathise; LINQ to SQL was a horrible ‘stopgap’ idea as it promoted the assumption that it was a good idea to run the application data model directly against a database schema, and make queries against its base tables. This is fine for the tiddling local databases but anathema for almost any corporate databases.
    LINQ to Entities, however, allows the application developer to write queries against a conceptual object model. Entity Framework creates a logical mapping between the conceptual object model used by the application, and the database schema. Because the database system is decoupled from conceptual object model, one can change the actual database system you use without requiring the application to be rewritten. The physical database system can be a production-standard system that requires that all traffic uses Stored Procedures. The application programmer is insulated from database changes, and vice versa. This abstraction layer above the database allows entity inheritance and entity composition. Everyone gains.
    Linq to SQL was always an interim solution. It is difficult to think of a long-term strategy for its development that does not converge with LINQ to Entities
    The problem for many application programmers is that they prefer a simple model. LINQ to SQL is easy to do. Entity Framework is a monster. There has always been an assumption that it was the ‘Impedence Mismatch’ itself that was the cause of the frustration of application programmers against database systems. Somehow, the thought went, if the application could have its conceptual object model, created and maintained by the application developers, then all would be well. It may be a mirage. When a conceptual object model comes up against the complexity and constraints of real business applications, things can quickly get out of hand. You can’t expect the average application programmer to be an accomplished data-modeler.
    As a way of outflanking NHibernate, Entity Framework is fine, but it is not yet a product that can provide a universal panacea.

  15. Thanks for the discussion people. I learnt more from watching a few experienced devs jostle about difference scenarios that I ever have from all the “reading about simple objects and simple applications”.

    Anybody know if Microsoft actually has places we can see the real stipped down version of opinion on their tools apart from random blogs?

    Ever since the day i reworked a chat app back through every deadend of the different forms of list controls until I dumped them all for repeaters in ASP.NET 1.1 (still do in most interface centric cases), I am getting increasingly old and cynical. With loooong weeks and many million user apps to look at I don’t have time to break down the accessibility marketing garbage through trial and error.

  16. “When a conceptual object model comes up against the complexity and constraints of real business applications, things can quickly get out of hand. You can’t expect the average application programmer to be an accomplished data-modeler.”

    nice quote, i will use that every time somebody says “what’s a solutions architect”?

  17. This made me laugh, “As a way of outflanking NHibernate, Entity Framework is fine, but it is not yet a product that can provide a universal panacea.”

    Your in IT and wanting a “universal panacea”? You must be new at all of this.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s