EF: db.SaveChanges() vs dbTransaction.Commit

Arjun Menon picture Arjun Menon · Jun 13, 2018 · Viewed 9.2k times · Source

I am fairly new to entity framework and I have a doubt on EF's db.SaveChange. From some posts and MSDN I learned that db.SaveChange by default does all the changes in transaction. Also there is a way we can create a transaction of our own using db.Database.BeginTransaction() ,"db" being my context class object. So I have two questions:

  1. What to use & when
  2. If I am inserting data to one table whose @@identity is foreign key to my next inserting table, rather than using db.SaveChange() to get the @@identity is there any other way (db.SaveChanges() is in a user defined transaction scope) and will db.SaveChanges() commit my changes to DB

Answer

Steve Py picture Steve Py · Jun 14, 2018

Yes, if you explicitly wrap your context within a transaction such as .Net's TransactionScope, you can retrieve auto-generated IDs from entities after a .SaveChanges() call, without committing the scoped transaction.

using (var tx = new TransactionScope())
{
  using (var context = new MyDbContext())
  {
     var newEntity = populateNewEntity();
     context.MyEntities.Add(newEntity);
     context.SaveChanges();
     int entityId = newEntity.EntityId; // Fetches the identity value.
  }
} // Rolls back the transaction. Entity not committed.

However, operations like this should be avoided unless absolutely necessary, and cautiously. Firstly, the above example is common use of TransactionScope, and the default isolation level of TransactionScope is "Serializable" which is the most pessimistic in terms of locking. Even moderate use of this pattern on systems that have a number of concurrent operations/users will result in deadlocks and performance hits due to lock waits. So if using a TransactionScope, be sure to specify an isolation level.

DTC is useful in scenarios where you want to coordinate commits between databases or other Tx-bound operations. For instance system A is saving changes and needs to coordinate an update/insert with system B through an API. A & B need to be configured to use DTC, but once that is done A can start a transaction, register it with DTC, append the DTC token to the header for B's API, B can find that token, create a ScopedTransaction linked to that token, and commit/rollback based on what A signals. This has an overhead cost meaning transactions on both systems are open longer than usual. If it's necessary then that is a cost of business. If it's not necessary then it is a waste and potential source of headaches.

One other reason that someone might look at using an explicit Tx is when they want to update FK's in a related entity. Creating an order has an option to create a new customer, order has a customer ID so we need to create the customer, get it's ID to set on the Order, then save the order. If the order save fails then the customer creation should roll back.

using (var tx = new TransactionScope())
{
  using (var context = new MyDbContext())
  {
     var newCustomer = createNewCustomer(); // dummy method to indicate creating a customer entity.
     context.Customers.Add(newCustomer);
     context.SaveChanges();
     var newOrder = createNewOrder(); 
     newOrder.CustomerId = newCustomer.CustomerId;
     context.Orders.Add(newOrder);
     context.SaveChanges();
  }
  tx.Commit();  
} 

With EF this scenario should be mitigated by using navigation properties with a relationship between order and customer. In this way you can create a customer, create the order, set the order's Customer reference to the new customer, add the order to the DbContext, and .SaveChanges(). This lets EF take care of going through the order, seeing the referenced customer, inserting that, associating the FK in the order, and committing the changes in one implicit Tx.

using (var context = new MyDbContext())
{
    var newCustomer = createNewCustomer();
    var newOrder = createNewOrder();
    newOrder.Customer = newCustomer;
    context.Orders.Add(newOrder);
    context.SaveChanges();
}

Update: To outline avoiding FK references in your entities... (many-to-one)

EntityTypeConfiguration for Order With FK in entity:

HasRequired(x => x.Customer)
  .WithMany(x => x.Orders) // Links to an element in the Orders collection of the Customer. If Customer does not have/need an Orders collection then .WithMany()
  .HasForeignKey(x => x.CustomerId); // Maps Order.Customer to use CustomerId property on Order entity.

EntityTypeConfiguration for Order With No FK in entity:

HasRequired(x => x.Customer)
  .WithMany(x => x.Orders)
  .Map(x => x.MapKey("CustomerId")); // Maps Order.Customer to use CustomerId column on underlying Order table. Order entity does not expose a CustomerId.

With EF Core -- From memory, may need to be updated.

HasRequired(x => x.Customer)
  .WithMany(x => x.Orders) // Links to an element in the Orders collection of the Customer. If Customer does not have/need an Orders collection then .WithMany()
  .HasForeignKey("CustomerId"); // Creates a shadow property where Entity does not have a CustomerId property.

Both approaches (with or without mapped FK) work the same. The benefit of the second approach is that there is no confusion in the code about how to update or assess the customer reference for the order. For example if you have both a Customer, and a CustomerId on the Order, changing the CustomerId and calling SaveChanges does not move the order to a new customer, only setting the Customer reference. Setting the Customer reference does not automatically update the CustomerId, so any code "getting" the customerId via the CustomerId property on order would still retrieve the old customer reference until the entity is refreshed.

The important thing to using navigation properties is to leverage them with deferred execution or eager-load them efficiently. For example if you want to load a list of orders and include their customer name:

using (var myContext = new MyDbContext())
{
  var orders = myContext.Orders.Where(x => x.OrderDate >= startDate && x.OrderDate < endDate).ToList();
  return orders;
}

** Bad: If this is MVC/Web API the serializer will take the orders collection, and attempting to serialize them hit every navigation property and attempt to load it. This triggers lazy-load calls one-by-one. So if Order has a Customer, that is a hit to the DB /w "SELECT * FROM Customers WHERE CustomerId = 42" If Order has Order lines then "SELECT * FROM OrderLines WHERE OrderLineId = 121", "SELECT * FROM OrderLines WHERE OrderLineId = 122" ... (You might think it'd know to fetch order lines by OrderId, but nope! Huge performance impact returning Entities, just don't do it.

using (var myContext = new MyDbContext())
{
  var orders = myContext.Orders
    .Include(x => x.Customer)
    .Include(x => x.OrderLines)
    .Where(x => x.OrderDate >= startDate && x.OrderDate < endDate).ToList();
  return orders;
}

** Better, but still bad. You might only include the items you think you'll need, but the serializer will still fetch everything on the order. This comes back to bite you as entities are revised to include new links to data. Even if you Include everything this is wasteful if all you wanted was the Customer Name.

using (var myContext = new MyDbContext())
{
  var orders = myContext.Orders
    .Where(x => x.OrderDate >= startDate && x.OrderDate < endDate)
    .Select(x => new OrderLineViewModel 
    {
      OrderId = x.OrderId,
      OrderNumber = x.OrderNumber,
      OrderAmount = x.OrderAmount,
      CustomerName = x.Customer.Name
    }).ToList();
  return orders;
}

** This is the sweet spot with navigation properties and deferred execution. The SQL that gets run on the DB returns just those 4 columns from the related data. No lazy load hits, and you send across the wire just the amount of data you need.

Some might argue that if you commonly need a CustomerId reference from an Order for example that having a CustomerId on the Order entity saves referencing the Customer. But as outlined above, that Id may not be reliable, and by using deferred execution to let EF use the entities to populate the data you want Getting the customer IDs of orders is just a matter of including/selecting x.Customer.CustomerId which includes just that desired column, not loading the entire entity to get it.