Sift Data cleaning for CRM imports

Excel broke my file

Excel ruined your CSV. Here's why it happens every time, and the fix that sticks.

Leading zeros vanish from postcodes, phone numbers become 4.47911E+11, the 7th of May turns into the 5th of July, and José arrives as José. These are all the same bug: a CSV stores no types, so Excel guesses, and saving bakes the guess in. Some of the damage is repairable, some is only preventable, and knowing which is which matters.

Repair your file in Sift, free →

The root cause, once

A CSV is plain text. There is no "this column is text" flag inside it, so every time Excel opens one it re-guesses each column's type from the values. Anything that looks numeric becomes a number: zip codes lose their zeros, long phone numbers switch to scientific notation, "MAR1" becomes the 1st of March. Format the column as Text and the fix holds, until anyone (you, a colleague, next month's you) double-clicks the CSV again and the guessing starts over. That's why forum threads titled "how do I stop Excel from ruining my data" never really get a formula answer: the durable fix is to clean the file outside Excel and never round-trip it through Excel again before the import.

Damage 1: leading zeros stripped (zip codes, IDs, phone numbers)

Before ExcelAfter Excel
01234 (Massachusetts zip)1234
0044 7911 123456447911123456
0001542 (record ID)1542

The manual fix: import the CSV via Data → From Text/CSV and set the column to Text before it loads, every single time, on every machine that touches the file.

Repairable? Often, yes, when the correct form is knowable. Sift's postcode normalization re-pads US zips to five digits and fixes UK and Canadian postcode shapes, and its phone normalization rebuilds international prefixes. A stripped zero on an arbitrary internal ID, though, is only recoverable if you know the intended width; Sift flags those for review rather than guessing.

Damage 2: scientific notation (long phone numbers and IDs)

Before ExcelAfter Excel
4479111234564.47911E+11
001234567890123456781.23457E+17

Repairable? Sometimes, and this is the honest part most pages skip. It depends on whether Excel wrote out every digit. "4.47911123456E+11" spells out all twelve digits, so Sift re-expands it to 447911123456, losslessly and automatically. But a display-truncated "4.47911E+11" only carries six of its twelve digits; the rest are gone from that file, and no tool can restore what isn't there; anyone claiming otherwise is guessing. Sift detects scientific notation in every column, re-expands only the fully specified values, and puts the truncated ones on the "needs your eyes" list instead of quietly treating garbage as data. For those, go back to the source: while the mangled file is still open in Excel, the digits are intact, so close without saving and clean a fresh copy outside Excel. Salesforce IDs mangled this way are a classic source of MALFORMED_ID import errors.

Damage 3: dates silently rewritten

Source meantExcel made it
07/05/2026 (7 May, UK)5 July 2026 (US locale)
MAR1 (product code)01-Mar
2026-05-0745784 (serial number)

Repairable? Mostly. Sift standardizes the column to an unambiguous format (YYYY-MM-DD, or the exact format your CRM template requires) and, crucially, previews the ambiguous rows, the ones where day and month are both 12 or under, so you decide 01/02 once, for the whole file, instead of Salesforce deciding it for you. Dates already collapsed to serial numbers are converted back. A product code that became "01-Mar" needs the original source; flag it, don't guess.

Damage 4: accents garbled on save (mojibake)

BeforeAfter
José GarcíaJosé García
MüllerMüller

Repairable? Yes, and this is the one almost nobody tells you. Every guide says "save as CSV UTF-8 next time", which does nothing for the broken file in front of you. But the corruption is mechanical: UTF-8 bytes read as the wrong encoding produce predictable garbage, so the transformation can be reversed. Sift detects the double-encoding pattern and repairs é back to é, in the browser. HubSpot's "something went wrong" upload error and German-character import threads are this exact bug.

The fix that sticks

  1. Stop round-tripping through Excel. Get the rawest export you can from the source system and treat it as read-only.
  2. Clean it in Sift: repair mojibake, re-pad postcodes, re-expand fully specified scientific notation, normalize phones to E.164 and dates to your CRM's format, with a before/after diff for every change and a review list for anything unrecoverable.
  3. Export and import directly. The clean file goes straight to the CRM import screen. If Excel never opens it, Excel can't re-ruin it.
Sift is a static web app with no backend: your file is processed entirely on your device and never uploaded. Once the page has loaded it even works offline, which is an easy way to verify the privacy claim for yourself.

Related guides